Use of formants in stressed and unstressed continuous speech recognition
نویسندگان
چکیده
Stress plays a crucial role in the understanding of speech by human listeners. However, automatic speech recognition results deteriorate in the presence of stress due to the change it causes in the speech parameters. Meanwhile, due to the vast diversity of the presence of stress in speech, a speech corpus that contains the majority of different stress conditions is difficult to obtain in real world. Therefore, other ways to improve stressed speech recognition performance have to be taken into account. In previous works, we have evaluated the effects of stress on several speech parameters such as phone durations, pitch and formant frequencies. In this paper, the use of formants in stressed speech recognition will be discussed. We have found that formants and their dynamics (slopes) are useful in improving speech recognition rates both in stressed and unstressed conditions.
منابع مشابه
Modeling lexical stress in continuous speech recognition for Dutch
The acoustic realization of vowels with lexical stress generally differs substantially from their unstressed counterparts, which are more reduced in spectral quality, shorter in duration, weaker in intensity and tend to have a flatter spectral tilt. Therefore, in a continuous speech recognizer (CSR) it would appear profitable to train separate models for the stressed and unstressed variants of ...
متن کاملUsing lexical stress in continuous speech recognition for dutch
The acoustic realization of vowels with lexical stress generally differs substantially from their unstressed counterparts, which are more reduced in spectral quality, shorter in duration, weaker in intensity and tend to have a flatter spectral tilt. Therefore, in an automatic speech recognizer it would appear profitable to train separate models for the stressed and unstressed variants of each v...
متن کاملCharacteristics of Contrast between the Stressed and the Unstressed in Rhythm Units Observed in Duration Structure in English Speech by Japanese Learners
English rhythm is related to contrast between the stressed and the unstressed in duration structure. In native English speech, in general, an intra-speaker average duration of stressed syllable is longer than that of syllable as a whole. On the contrary, that of unstressed syllable is shorter than that of syllable as a whole. In the previous paper by the present author, it was reported that str...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل